16 research outputs found
Large Language Models are Zero-Shot Reasoners
Pretrained large language models (LLMs) are widely used in many sub-fields of
natural language processing (NLP) and generally known as excellent few-shot
learners with task-specific exemplars. Notably, chain of thought (CoT)
prompting, a recent technique for eliciting complex multi-step reasoning
through step-by-step answer examples, achieved the state-of-the-art
performances in arithmetics and symbolic reasoning, difficult system-2 tasks
that do not follow the standard scaling laws for LLMs. While these successes
are often attributed to LLMs' ability for few-shot learning, we show that LLMs
are decent zero-shot reasoners by simply adding "Let's think step by step"
before each answer. Experimental results demonstrate that our Zero-shot-CoT,
using the same single prompt template, significantly outperforms zero-shot LLM
performances on diverse benchmark reasoning tasks including arithmetics
(MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin
Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled
Objects), without any hand-crafted few-shot examples, e.g. increasing the
accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with
175B parameter InstructGPT model, as well as similar magnitudes of improvements
with another off-the-shelf large model, 540B parameter PaLM. The versatility of
this single prompt across very diverse reasoning tasks hints at untapped and
understudied fundamental zero-shot capabilities of LLMs, suggesting high-level,
multi-task broad cognitive capabilities may be extracted by simple prompting.
We hope our work not only serves as the minimal strongest zero-shot baseline
for the challenging reasoning benchmarks, but also highlights the importance of
carefully exploring and analyzing the enormous zero-shot knowledge hidden
inside LLMs before crafting finetuning datasets or few-shot exemplars.Comment: Accepted to NeurIPS2022. Our code is available at
https://github.com/kojima-takeshi188/zero_shot_co
Collective Intelligence for Object Manipulation with Mobile Robots
While natural systems often present collective intelligence that allows them
to self-organize and adapt to changes, the equivalent is missing in most
artificial systems. We explore the possibility of such a system in the context
of cooperative object manipulation using mobile robots. Although conventional
works demonstrate potential solutions for the problem in restricted settings,
they have computational and learning difficulties. More importantly, these
systems do not possess the ability to adapt when facing environmental changes.
In this work, we show that by distilling a planner derived from a
gradient-based soft-body physics simulator into an attention-based neural
network, our multi-robot manipulation system can achieve better performance
than baselines. In addition, our system also generalizes to unseen
configurations during training and is able to adapt toward task completions
when external turbulence and environmental changes are applied
Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Generating human-like behavior on robots is a great challenge especially in
dexterous manipulation tasks with robotic hands. Even in simulation with no
sample constraints, scripting controllers is intractable due to high degrees of
freedom, and manual reward engineering can also be hard and lead to
non-realistic motions. Leveraging the recent progress on Reinforcement Learning
from Human Feedback (RLHF), we propose a framework to learn a universal human
prior using direct human preference feedback over videos, for efficiently
tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation,
without a single human demonstration. One task-agnostic reward model is trained
through iteratively generating diverse polices and collecting human preference
over the trajectories; it is then applied for regularizing the behavior of
polices in the fine-tuning stage. Our method empirically demonstrates more
human-like behaviors on robot hands in diverse tasks including even unseen
tasks, indicating its generalization capability
Bi-Manual Block Assembly via Sim-to-Real Reinforcement Learning
Most successes in robotic manipulation have been restricted to single-arm
gripper robots, whose low dexterity limits the range of solvable tasks to
pick-and-place, inser-tion, and object rearrangement. More complex tasks such
as assembly require dual and multi-arm platforms, but entail a suite of unique
challenges such as bi-arm coordination and collision avoidance, robust
grasping, and long-horizon planning. In this work we investigate the
feasibility of training deep reinforcement learning (RL) policies in simulation
and transferring them to the real world (Sim2Real) as a generic methodology for
obtaining performant controllers for real-world bi-manual robotic manipulation
tasks. As a testbed for bi-manual manipulation, we develop the U-Shape Magnetic
BlockAssembly Task, wherein two robots with parallel grippers must connect 3
magnetic blocks to form a U-shape. Without manually-designed controller nor
human demonstrations, we demonstrate that with careful Sim2Real considerations,
our policies trained with RL in simulation enable two xArm6 robots to solve the
U-shape assembly task with a success rate of above90% in simulation, and 50% on
real hardware without any additional real-world fine-tuning. Through careful
ablations,we highlight how each component of the system is critical for such
simple and successful policy learning and transfer,including task
specification, learning algorithm, direct joint-space control, behavior
constraints, perception and actuation noises, action delays and action
interpolation. Our results present a significant step forward for bi-arm
capability on real hardware, and we hope our system can inspire future research
on deep RL and Sim2Real transfer of bi-manualpolicies, drastically scaling up
the capability of real-world robot manipulators.Comment: Our accompanying project webpage can be found at:
https://sites.google.com/view/u-shape-block-assembly. arXiv admin note:
substantial text overlap with arXiv:2203.0827
World Robot Challenge 2020 -- Partner Robot: A Data-Driven Approach for Room Tidying with Mobile Manipulator
Tidying up a household environment using a mobile manipulator poses various
challenges in robotics, such as adaptation to large real-world environmental
variations, and safe and robust deployment in the presence of humans.The
Partner Robot Challenge in World Robot Challenge (WRC) 2020, a global
competition held in September 2021, benchmarked tidying tasks in the real home
environments, and importantly, tested for full system performances.For this
challenge, we developed an entire household service robot system, which
leverages a data-driven approach to adapt to numerous edge cases that occur
during the execution, instead of classical manual pre-programmed solutions. In
this paper, we describe the core ingredients of the proposed robot system,
including visual recognition, object manipulation, and motion planning. Our
robot system won the second prize, verifying the effectiveness and potential of
data-driven robot systems for mobile manipulation in home environments